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Abstract 

We introduce a novel family of convex non-quadratic invariant functionals that we employ to derive 
regularized solutions of ill-posed linear inverse imaging problems. The proposed regularizers involve 
the Schatten norms of the Hessian matrix, computed at every pixel of the image. They can be viewed 
as second-order extensions of the popular total-variation (TV) semi-norm since they satisfy the same 
invariance properties. Meanwhile, by taking advantage of second-order derivatives, they avoid the staircase 
effect, a common artifact of TV-based reconstructions, and perform well for a wide range of appUcations. 
To solve the corresponding optimization problems, we propose an algorithm that is based on a primal- 
dual formulation. A fundamental ingredient of this algorithm is the projection of matrices onto Schatten 
norm balls of arbitrary radius. This projection is performed efficiently based on a direct link we provide 
between vector projections onto £q norm balls and matrix projections onto Schatten-matrix norm balls. 
Finally, we demonstrate the effectiveness of the proposed methods through experimental results on several 
inverse imaging problems with real and simulated data. 
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A Hessian Schatten-Norm Regularization 
Approach For Solving Linear Inverse Problems 

I. Introduction 

Linear inverse problems arise in a host of imaging applications, ranging from microscopy and medical 
imaging to remote sensing and astronomical imaging [1]. The task is to reconstruct the underlying 
image from a series of degraded measurements. These problems are often formulated within a variational 
framework, where image reconstruction can be cast as the minimization of an energy functional subject 
to some penalty. The role of the penalty is significant, since it imposes certain constraints on the solution 
and considerably affects the quality of the reconstruction. 

The importance of choosing an appropriate penalty has initiated the development of regularization 
functionals that can effectively model certain properties of natural images. A popular regularization 
criterion is the total-variation (TV) seminorm [2] which has been successfully applied to several imaging 
problems such as image denoising, restoration [3], [4], [5], inpainting [6], zooming [7], and MRI 
reconstruction [8]. TV owes its success to its ability to preserve the edges of the underlying image 
well. Its downside, however, is that it introduces blocking artifacts (a.k.a. staircase effect) [9]. The reason 
is that TV favors vanishing first-order derivatives. Thus, it tends to result in piecewise-constant solutions 
even when the underlying images are not necessarily piecewise constant. This tendency is responsible for 
oversharpening the contrast along image contours and can be a serious drawback in many applications. 

A common workaround to prevent the oversharpening of regions with smooth intensity transitions is to 
replace TV by functionals that involve higher-order differential operators, because higher-order derivatives 
can potentially restore a wider class of images. Often, moving from piecewise-constant to piecewise-linear 
reconstructions offers a satisfactory improvement in the fitting of smooth intensity changes, so that most 
of the published functionals involve second-order differentials. Such regularizers have been considered, 
mostly for image denoising, either combined with TV [9], [10], [11], [12], [13] or in a standalone 
way [14], [15], [16], [17], [18]. These recent advances motivate us to investigate a class of regularizers 
that depend on matrix norms of the Hessian. These regularizers enjoy most of the favorable properties of 
TV; namely, convexity, contrast, rotation, translation, and scale invariance (up to a multiplicative constant), 
while they avoid the staircase effect by not penalizing first-order polynomials. 

The key contributions of this work are as follows: 
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1) The identification of a novel family of invariant functionals that involve Schatten norms of the 
Hessian matrix, computed at every pixel of the image. These are used in a variational framework to 
derive regularized solutions of ill-posed linear inverse imaging problems. Our functionals capture 
curvature information related to the image intensity and lead to reconstructions that avoid the 
staircase effect. 

2) A general first-order algorithm for solving the resulting constrained optimization problems under 
any choice of Schatten norm. The proposed algorithm relies on our derivation of a primal-dual 
formulation and copes with the non-smooth nature and the high dimensionality of the problem. 

3) A direct link between matrix projections onto Schatten-matrix norm balls and vector projections 
onto norm balls. This link enables us to design an efficient method for performing matrix 
projections. Although it is a fundamental component of our optimization algorithm, our result is 
not specific to the Hessian and can potentially have a wider applicability. 

The rest of the paper is organized as follows: In Section II, we discuss regularization functionals that 
are commonly used in imaging problems. Then, by focusing on invariance principles we derive our novel 
family of non-quadratic second-order functionals. In Section III, we present the discrete formulation of the 
problem and we describe the proposed optimization algorithm. In Section IV, we assess the performance 
of our approach for several linear inverse imaging problems with experiments on standard test images 
and real biomedical images. We conclude our work in Section V. Proofs of mathematical statements are 
given in Appendices. 



where / is an image, $7 C M^, D is the regularization operator (scalar or multi-component) acting on 
the image, and $ (•) is a potential function. Typical choices for D are differential operators such as the 
Laplacian (scalar operator) or the gradient (vectorial operator), while the potential function $ usually 
involves a norm distance. During many years, the preferred choice for the potential function has been the 
squared Euclidean norm, because of its mathematical tractability and computational simplicity. However, 
it is now widely documented that non-quadratic potential functions can lead to improved results; they 
can be designed to be less sensitive to outliers and therefore associated with better edge reconstruction 
capabilities. A typical example is the TV semi-norm, which for smooth images corresponds to the Li 
norm of the magnitude of the gradient. 



II. Derivative-Based Regularization 



The most commonly-used regularizers can be represented as 




(1) 
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Our present goal is to introduce new regularization functionals of the form of (1) which amounts to 
specifying some suitable linear operator D, and potential function To do so, certain requirements should 
be fulfilled. In particular, following the example of TV, we restrict ourselves to regularization operators 
that commute with translation and scaling, and potential functions that preserve these properties while 
introducing additional rotation invariance. Our motivation for enforcing these invariances is that, similarly 
to what is the case in many physical systems, one should opt for reconstruction algorithms that lead to 
solutions which are not affected by transformations of the coordinate system. An additional desirable 
requirement is that the regularizers should be convex to ensure the existence of a global minimum and 
to permit the design of efficient minimization techniques. 

A. Gradient Norm Regularization 

We would like our regularization operator to be translation and scale invariant. Therefore, a reasonable 
choice for D is some form of derivative operator. Based on this, we first characterize the complete class 
of gradient-based regularizers satisfying all the required invariances. This is accomplished by specifying 
the valid form for the potential functions according to the following theorem, whose proof is given 
in Appendix A. 

Theorem 1. Let TZ{f) be of the form (1), with D being the gradient operator. TZ{f) is a translation-, 
rotation-, and scale-invariant functional, if and only if the potential function ^ -.M? W is of the form: 
<^ (V/ (r)) = c I V/ (r)l'', where G R and c is an arbitrary constant 

As a direct consequence of Theorem 1, we see that the following gradient-based regularizers are the 
only choice of regularization satisfying the required invariance properties; ignoring the multiplicative 
constant c of the potential function, which can be absorbed by the regularization parameter, we get 



Since we are also interested in convex regularization functionals, we shall focus on cases where > 1 
in (2). A popular instance of convex functionals arises if we choose u = I, which corresponds to the TV 
functional. This regularizer enjoys an additional property, that of contrast invariance (1 -homogeneity). 

B. Hessian Schatten-Norm Regularization 

As already mentioned, the use of TV, which is the best representative of the gradient-based regulariza- 
tion family, suffers from certain drawbacks. Therefore, for the reasons specified in the introduction, we are 




(2) 
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interested in differential operators of higher-order and in particular of the second order. In A^-dimensions, 
the complete spectrum of second-order derivatives is embodied in the Hessian operator, 

friri (r) /r-ir-2 {^) 



(3) 



where f^^j.^ (r) = g^^g^ / (r)- Indeed, with the aid of the Hessian we can compute any second-order 
derivative of / (r) as ^/ (r) = uJHf {r)Vfp, where ug = {cos 9, s'mO) and v,^ = (cos(/), smcj)) are 
unit-norm vectors specifying the directions of differentiation and (•)^ denotes the transpose operation. 

Having specified the regularization operator D, the next step is to investigate which class of potential 
functions <& leads to translation-, rotation-, and scale-invariant second-order regularizers. Next, we 
provide Theorem 2 which completely characterizes the form of <I>, under these prerequisites. Before 
presenting this result, we first give the general definition of a Schatten matrix norm [19] that will 
be used in the sequel, and introduce some of the adopted notation. We denote the set of unitary 
matrices as U" = {X G C"^" : X^^ = X^}, where C is the set of complex numbers and (•)^ 
is the Hermitian transpose. We also denote the set of positive semidefinite diagonal matrices as 
|X € ]R"i^"^ : X (i, j) = Vi 7^ j}, where R+ is the set of real non-negative numbers. 



mixn2 



Definition 1 (Schatten norms). Let X G C"^^"^ be a matrix with the singular-value decomposition (SVD) 
X = USV^, where U G W and V G U"^ consist of the singular vectors of X, and S G ©"iX"^ 
consists of the singular values of X. The Schatten norm of order p (Sp norm) of X is defined as 

(min(ni,n2) \ p 

E , (4) 

where p > 1, and cTk (X) is the k-th singular value ofX, which corresponds to the {k, k) entry of XI. 

Definition 1 implies that the Sp norm of a matrix X corresponds to the Ip norm of its singular-values 
vector a (X) G R^i'^^"!'"^). This further means that all Schatten norms are unitarily invariant. Moreover, 
we note that the family of Sp norms includes three of the most popular matrix norms, i.e., the nuclear/trace 
norm {p = 1), the Frobenius norm {p = 2) and the spectral/operator norm {p = oo). 

Theorem 2. Let TZ (/) be of the form (1), with D being the Hessian operator. TZ (/) is a translation-, 
rotation-, and scale -invariant functional, if and only if the potential function : M^^^ i— > M /i' of the form: 
$ {T-Lf (r)) = $0 {\f (r) / ||^/ (r)||^ ^ \\Hf (r)||^ , where G M and <^o <^ zero-degree homogeneous 
function of the Hessian eigenvalues Xf (r). 
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The proof of Theorem 2 is given in Appendix A. Now, according to it, the admissible second-order 
regularizers, with respect to the invariance properties of the coordinate system, are those depending on 
the Schatten norms of the Hessian. If we set <I>o = 1> we obtain the following regularization family 



To further ensure convexity we need to impose that u > 1. Finally, for our regularizers to also enjoy the 
contrast-invariance property (similar to TV), we focus on the case where v = 1. Consequently, we define 
our proposed family of non-quadratic second-order regularization functionals as 



The introduced functionals, depending on the Hessian, lead to piecewise-linear reconstructions. These 
reconstructions can better approximate the intensity variations observed in natural images than the 
piecewise-constant reconstructions provided by TV. Thus, they are able to avoid the staircase effect. 
Moreover, since the Hessian of / at coordinates r is a 2 x 2 symmetric matrix, the SVD in the Schatten 
norm definition reduces to the spectral decomposition and the singular values correspond to the absolute 
eigenvalues, which can be computed analytically. Now, if we consider the intensity map of the image 
as a 3-D differentiable surface, then the two eigenvalues of the Hessian at coordinates r correspond to 
the principal curvatures. They can be used to measure how this surface bends by different amounts in 
different directions at that point. Therefore, the proposed potential functions, which depend upon those, 
can be interpreted as scalar measurements of the curvature at a local surface patch. For example, the 
^2 norm (Frobenius norm) of the Hessian is a scalar curvature index, commonly used in differential 
geometry, which quantifies lack of flatness of the surface at a specific point. We can thus safely state 
that the proposed regularizers incorporate curvature information about the image intensity. 

Finally, we note that the regularizers obtained for two choices of p = 2, oo, coincide with functionals 
we considered in our previous work in [20], where we followed another path for extending TV based on 
rotational averages of directional derivatives. To the best of our knowledge, the Hessian Schatten norms 
for p 7^ 2, oo have not been considered before in the context of inverse problems. 

HI. Variational Image Reconstruction 

From now on, we focus on the discrete formulation of the image reconstruction problem. Hereafter, 
in order to avoid any confusion between the continuous and the discrete domains we will use bold-faced 
symbols to refer to the discrete manipulation of the problem. 




(5) 




(6) 
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A. Discrete Problem Formulation 

Our approach for reconstructing the underlying image from the measurements is based on the Unear 
observation model 

y = Ax + w , (7) 

where A G j^A^x^^ a matrix that models the spatial response of the imaging device, while y G and 
X G are the vectorized versions of the observed image and the image to be estimated, respectively. 
Apart from the effect of the operator A acting on the underlying image, another perturbation is the 
measurement noise, which is intrinsic in the detection process. This degradation factor is represented in 
our observation model by w that we, here on, will assume to be i.i.d Gaussian noise with variance cx^. 

The recovery of x from the measurements y belongs to the category of linear inverse problems. Usually, 
for the cases of practical interest, it is ill-posed [21]: the operator A is either ill-conditioned or singular. 
This is dealt with in the variational framework by forming an objective function 

= ^||y- Ax||^ + rV^(x) , (8) 

whose role is to quantify the quality of a given estimate. The first term, also known as data fidelity, 
corresponds to the negative Gaussian log-likelihood and measures how well a candidate estimate explains 
the observed data, while the second one, the regularization term, encodes our beliefs about certain 
characteristics of the underlying image. The role of the regularizer is to narrow down the set of plausible 
solutions by penalizing those that do not satisfy the assumed properties. The parameter r > provides 
a balance between the contribution of the two terms by quantifying the degree of "trust" we show to the 
measurements and to our image model. The image reconstruction problem is then cast as the minimization 
of (8) and leads to a penalized least-squares solution. 

B. Discrete Hessian Operator and Basic Notations 

In this work we focus on the class of Hessian Schatten-norm regularizers presented in (6). In the 
sequel, we use H to refer to the discrete version of the Hessian operator. To simplify our analysis, we 
assume that the image intensities on a x A^j, grid are rasterized in a vector x of size N = • Ny, 
so that the pixel at coordinates {i ,j) maps to the nth entry of x with n = jN^ + (i + 1). In this case, 
the discrete Hessian operator is a mapping 1-L : ^ X, where X = M^^^^^. For x G M^, 'Hx is 
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(9) 



given as 

where [•]^^ denotes the nth element of the argument, n = 1 , . . . , N , and A^in^ ^r^r^' ^''^^ A^ra denote 
the forward finite-difference operators [22], which provide the discrete approximations of the second-order 
partial derivatives along the two dimensions of the image. If we assume Neumann boundary conditions 
(mirror image plane), these operators are defined as 



[AraraX 



Xi+2,j - 2a;i+ij + Xij, I <i < - 2, 
XN^-l,j - XN^j, i> N^-l, 

Xij+2 - 2xij+i + Xij, I < j < Ny - 2, 
Xi,Ny-i - Xi^Ny, j > Ny -1, 

Xi+lJ-{-l Xi^ij 

— Xjj+i + Xij, I < i < Nx — 1 and 1 < j < A^j; — 1, 



(10a) 



(10b) 



(10c) 



0, otherwise . 

Note that, in the above definitions, for ease of notation, we do not use the vectorized representation of 
the image but rather the standard one. 

We equip the space X with the inner product (• , •)^ and norm \\-\\;^. To define them, let X, Y G Af, 
with X„, YneR'^^'^y n = l,...,N. Then we have 



N 



{X,Y)^ = J2iT (Y^X„) 



(11) 



n=l 



and 



||X||;,= ^(X,X);,, (12) 

where tr (•) is the trace of the input matrix. For the Euclidean space we use the standard definition 
of the inner product and of the norm. We denote them by (• , and ||-||2, respectively. 
The adjoint of H is the discrete operator "K* : Af i— such that 

{Y,nx)^ = {n*Y,^)^. (13) 
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This definition of the adjoint operator is a generaUzation of the Hermitian transpose, which holds for 
matrices. Based on the relation of the inner products in (13), we show in Appendix B that for any Y G Af, 
it holds that 



A* Y^i'i)] + [a* /"y^^'^) + y(2.i) 



rir2 



+ 



(14) 



where Y^*'-'^ is the entry of the 2x2 matrix Y„ and A*^,.^, A*^^,^, A*^^^ are the adjoint operators 
that correspond to backward difference operators with Neumann boundary conditions. 



C. Majorization-Minimization Algorithm 

We now present a general method to compute the minimizer of the functional in (8), under any 
Hessian-based Sp norm regularizes Since these regularizers are non-smooth, our algorithm will be based 
on a majorization-minimization (MM) approach (cf. [23], [24], [25] for instance). Under this framework, 
instead of directly minimizing (8), we find the solution via the successive minimization of a sequence 
of surrogate functions that upper bound the initial objective function [26]. Our motivation for taking this 
path is that each of the surrogate functions is simpler to minimize, and we can rely on a gradient scheme 
that efficiently copes with the large dimensionality of the problem. 

To obtain the surrogate functions, we upper bound the data term of our objective function using the 
following majorizer [23], [27] 

1 2 

5((x,xo) = - ||y - Ax||2 + d(x,xo) , (15) 

where d (x, xo) = ^ (x — xq)^ [al — A^A] (x — xq) is a function that measures the distance between 
X and Xq. To come up with a valid majorizer we need to ensure that ci(x, xq) > 0, Vx 7^ xq, with 
equality if and only if x = xq. This prerequisite is true if al — A'^A is positive definite, which implies 
that a > llA^Ajl^. The upper-bounded version of the overall objective (8) can be written as 

o; 2 

ip (x, Xo) = g (x, Xo) + Tip (x) = - ||x - ZII2 + (x) + c , (16) 

where c is a constant and z is given by 

z = xo + a~^A^ (y - Axo) . (17) 

Then, the next step is to iteratively minimize (16) w.r.t x, setting xq to the previous iteration's solution. As 
we see, in this new objective function there is no coupling between x and the operator A anymore, which 
turns the minimization task into a much simpler one. In fact, the minimizer of (16) can be interpreted 
as the solution of a denoising problem with z being the noisy measurements. 
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D. Proximal Map Evaluation and Matrix Projections 

The MM formulation of our problem relies on the solution of a simpler problem of the form 

1 2 

X = argmin - ||x - zjjg + (x) + ic (x) , (18) 

x6K" ^ 

where ic is the indicator function of a convex set C that represents additional constraints on the solution, 
such as positivity or box constraints. The convention is that lc (x) takes the value for x e C and oo 
otherwise. If i? (x) = rip (x) + lq (x) is a proper, closed, convex function, then the solution of (18) is 
unique and corresponds to the value of the Moreau proximity operator [28], defined as 

1 2 

prox^ (z) = argmin - ||x — z||2 + (x) . (19) 

The proximal map of "d (x) cannot always be obtained in closed-form, and this is also the case for the 
regularizers under study. For this reason, we next present a primal-dual approach that results in a novel 
numerical algorithm, which can efficiently compute the solution. 

A fundamental ingredient of our proposed algorithm is the orthogonal projection of matrices onto 
Sq norm balls. This projection can be performed efficiently based on the following proposition, which 
provides a direct link between vector projections onto norm balls and matrix projections onto Sq norm 
balls. This result is new, to the best of our knowledge. A relevant result that can be considered as a 
converse statement of Proposition 1 can be found in [29, Theorem A.2]. 

Proposition 1 (Schatten Norm Projections). Let Y G C"iX"2 ^ matrix with SVD decomposition 

Y = UEV^, where U G W\ V G U"^ and E G The orthogonal projection ofY onto the set 
Bs^ = {x G C"i><"^ : 11X11^^ < p} is given by 

Vbs, (Y) = Vdiag {Vb, (ct (Y))) , 

where diag (•) is the operator that transforms a vector to a diagonal matrix and Vb^ is the orthogonal 
projection onto the £q norm ball i3g = |v G . ||v||^ < p| of radius p. 

Based on Proposition 1, whose proof is given in Appendix B, we design an algorithm for the orthogonal 
projection of a matrix Y onto the convex set Bs^- The algorithm consists of three steps: (a) decompose 

Y in its singular vectors and singular values by means of the SVD; (b) project its singular values onto 
the corresponding £q norm ball Bq', and (c) obtain the projected matrix via singular value reconstruction 
(SVR) using the projected singular values and the original singular vectors. 
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We next describe all of the steps leading to the proposed algorithm that solves the problem 

argmin - ||x — + r ||^x||-|^ \fp>l. (20) 
XGC 2 

With ||-Hx|| ]^ we denote the discrete version of our proposed regularization family (6), where 
stands for the mixed d-i-Sp norm, which for an argument * = , *2 > ■ ■ ■ ' *Ar] G Af is defined as 

AT 

ll*lli,P = Ell*"ll5. '^^'^l- (21) 

n=l 

The discrete form of our regularizers highlights their relation to the sparsity-promoting group norms, 
which are commonly met in the context of compressive sensing (see [30], for instance). However, a 
significant difference is that in our case the mixed norm is a vector-matrix norm rather than a vector- vector 
norm. Therefore, while the machinery we are using shares some similarities with the one employed in the 
group vector-norm case, there are important differences, with the most pronounced being the projection 
step. 

Since the operator of our choice is the Hessian, which produces 2x2 symmetric matrices at every 
coordinate of x, G (21), where = {X G M2x2 . = x}. However, for reasons of 

completeness, in the following lemma, where we derive the dual of the ii-Sp norm, we consider the 
more general case G C"^^"^ The proof of Lemma 1 is provided in Appendix B and follows a 
similar line of thought with the one presented in [31, Lemma 1]. The latter is about the dual norm of a 
mixed i\-lp vector norm. 

Lemma 1. Let p > 1, and let q be the conjugate exponent of p, i.e., ^ + | = 1. Then, the mixed norm 
IHloo q dual to the mixed norm 

Using Lemma 1 and noting that the dual of the dual norm is the original norm [32], we write (21) in 
the equivalent form 

ll*lli,P= max (fi, , (22) 
where Boo^q denotes the ioo-Sq unit-norm ball, defined as the set 

Boo,g = {n= [nj,nl,...,nlf ex : < i,v?i = i,...,iv} . (23) 



This alternative definition of the mixed ii-Sp norm allow us to express it in terms of an inner product 
that involves the dual variable Q and the unit-norm ball Boo.g- Moreover, from (23) it is straightforward 
to see that the orthogonal projection onto Boo,q is obtained by projecting separately each submatrix ri„ 
onto a unit-norm Sq ball (Bs^). 

September 18, 2012 DRAFT 
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Using (22) we re-write (20) as 

X = arg min — ||x — zllg + T max {Q. , T-Lyi) -^^ . (24) 
This formulation naturally leads us to the following minimax problem 



min max C (x, fi) , (25) 

X6C neBoo,,, 



where 



£(x,n) = ^||x-z||2 + T(n,^x)^ 

= ^||x-z||^ + T(^*n,x)2 . (26) 

Since the function C (x, fi) is strictly convex in x and concave in fi, we have the guarantee that a 
saddle- value of C is attained [32], and, thus, the order of the minimum and the maximum in (25) does 
not affect the solution. This means that there exists a saddle -point ^x, tl^ that leads to a common value 
when the minimum and the maximum are interchanged, i.e., 

min max £ (x, fi) = £ ( x, ) = max min C (x, fi) . (27) 

xec neBoc,, V / neBoc,, xec 

Based on this observation, we can now define the primal and dual problems by identifying the primal 
and dual objective functions, respectively. The l.h.s of (27) corresponds to the minimization of the primal 
objective function g (x), and the r.h.s to the maximization of the dual objective function s (fi), 

1 2 

^(x)= max £ (x, $1) = - ||x - z||2 + r ||?^x||^^p , (28) 
s (fi) = min £ (x, fi) 

= ^||PC(Z-T?^*fi)-(Z-T?^*n)||^ 

+ i||z||^-^||z-T?^*n||^ , (29) 

where Vq is the orthogonal projection onto the convex set C. Therefore, (27) indicates that we can obtain 
the minimizer x of (x) from the maximizer of s (fi) through the relation 

i = (z - TU*tl^ . (30) 

This last relation is important, since in contrast to the primal problem (20), which is not continuously 
differentiable, the dual one involves the smooth function s (Q). We can therefore solve it by exploiting 
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its gradient. Indeed, using the property that the gradient of a function /i (x) = ||x — "Pc (x)||2 is well 
defined and is equal to V/i (x) = 2 (x — (x)) [5, Lemma 4.1], we compute the gradient of s (Q) as 

Vs{n) = Tnvc{z-Tn*n). (3i) 

Therefore, the solution of our primal problem (20) is obtained in two steps: first we find the maximizer 
of the dual objective function (29) as described next, and then we obtain the solution through (30). 

E. Maximization of the Dual Objective 

At this point, a main issue we need to deal with, is that the Hessian operator does not have an 
empty null space and, thus, a stable inverse does not exist. Consequently, we cannot opt for a closed-form 
solution for the maximizer of s (CI). This means that we have to resort to a numerical iterative scheme. 
In this work, we employ Nesterov's iterative method [33] for smooth functions. This is a gradient-based 
scheme that exhibits convergence rates of one order higher than the standard gradient-ascent method. 
To ensure convergence of the algorithm, we need to choose an appropriate step-size. Since our dual 
objective is smooth with Lipschitz continuous gradient, we can use a constant step-size, thus, avoiding a 
line search at every iteration. An appropriate step-size is equal to the inverse of the Lipschitz constant of 
Vs (f2). We derive an upper bound of this Lipschitz constant in the following proposition, whose proof 
is given in Appendix B. 

Proposition 2. Let L (s) denote the Lipschitz. constant ofVs (ft) of the dual objective function defined 
in (29). Then, it holds that 

L (s) < 64r2 . (32) 

From (27) and (29) it is clear that the maximizer of our dual objective can be derived by solving the 
constrained maximization problem 

n = argmax- ||Pc (z - Tn*n) - (z - TU*n)\\l - - ||z - TH*n\\l . (33) 

neBoo,, 2 2 

A necessary step towards this direction is to compute the projection onto the set B^^q, defined in (23). 
This operation is accomplished by projecting independently each of the N components fi„ of Q, onto 
the set Bs^ = |xgS^:||X||^ — "^^^^ projection is performed efficiently following the three steps 
of the algorithm we proposed in Section III-D, which is based on our Proposition 1. 

Steps (a) and (c) are fairly easy to implement. Specifically, since the matrices of interest are 2x2 
symmetric, we compute the SVD and the SVR steps in closed-form. Then, the most cumbersome part 
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of our algorithm is the norm projection of the singular values, which for general values of q does 
not exist in closed form. Fortunately, this operation is still feasible thanks to the recently developed Iq 
norm projection algorithm [31]. This projection method is based on an efficient proximity algorithm for 
Iq norms [34]. Moreover, in Section III-F we report three cases of Sq norms, q = l,2,oo, where the 
projection can be evaluated in closed-form. 



F. Closed Form of Projections for Sq = 1, 2, oo. 

From Proposition 1, we know that the matrix projection onto the unit-norm ball is associated with 
the projection of the singular values of the matrix onto the B2 ball. The latter is computed by normalizing 
the elements of the corresponding vector by their Euclidean norm. Therefore, we have that 



(34) 

ri„ , if ||r2„, iipi < 1 . 



This situation is advantageous since it allows us to avoid both the SVD and the SVR steps. Consequently, 
this drastically reduces the complexity of computing the projection. To compute the projection onto Bs^ , 
we use that the projection onto the Boo unit-norm ball corresponds to setting the elements that have an 
absolute value greater than one to one, and adding back their original sign. Therefore, we readily get 

Vbs^ i^n) = U^diag (mill (cr (n„) , 1)) V , (35) 

where 1 is a vector with all elements set to one and the min operator is applied component-wise. Note 
that, this result is directly related to the singular value thresholding (SVT) method [35], [36] developed 
in the area of matrix rank minimization. The derivation of SVT in [35], [36] is technical. It relies on the 
characterization of the subgradient of the nuclear norm [37]. By contrast, in our case the result comes 
out naturally as an immediate consequence of Proposition 1 and of the duality between the spectral and 
nuclear matrix norms. Finally, the projection of a matrix onto the Bs^ unit-norm ball is related to the 
projection of its singular values onto the Bi unit-norm ball. The latter is computed by the soft-thresholding 
operator (cr (ri„)) = max (cr (fin) — 7, 0) [38], where the max operator is applied component-wise. 
Therefore, based on Proposition 1, we have that 

Vbs, i^n) = U^diag (5^ (cr (fi,))) V . (36) 

This last projection cannot in general be computed in closed form. The reason is that the threshold 7 
is not known in advance and we have to estimate it. This can be accomplised using one of the existing 
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methods available in the literature [39], [40], [41], [42]. Fortunately, in our case, the singular vectors are 
of low dimensionality, a (fin) G and, thus, we derive the selection rule for 7 analytically, as 

,if cri(n„) < l-cr2(n„), 

7 = <; ,if 1-^2 (fin) <^l + ^2 (^^n), (37) 

cri(f2„)-l ,if cri(n„) > l + o-2(n„), 

where the singular values are sorted in a decreasing order, i.e., cri (fi^) > C2 (f^n)- 

G. Numerical Algorithm 

Equipped with all the necessary ingredients, we conclude with a summarized description of the complete 
optimization algorithm. Our method consists of two components that interact. The first component is 
responsible for the majorization of the objective function, as we described in Sec. III-C, while the second 
one undertakes the minimization of the resulting upper-bounded version. Then, the algorithm proceeds by 
iteratively minimizing the majorizer that is formed based on the solution of the previous iteration. Since the 
convergence of this scheme can be slow in practice, to speed it up we employ the FISTA algorithm [25]. 
This method exhibits state-of-the-art convergence rates by combining two consecutive iterates, in an 
optimum way. A description of our image reconstruction approach that is based on the monotone version 
of FISTA (MFISTA) [5] is given in Algorithm 1. The sub-routine denoise corresponds to the second 

component that finds the solution of (20). This minimizer is related to the proximal map prox^n-^ n 

II iii,p 

but, from a signal processing perspective, we can also interpret it as a denoising step under Hessian-based 
li-Sp norm regularization. The computation of the denoise sub-routine is described in Algorithm 2 
and is based on the primal-dual formulation that we proposed in Sees. III-D and III-E. 

Finally, regarding the computational complexity of the algorithm, it is only mildly higher than that 
of TV's. The extra computational cost is due to (a) the use of a tensor (Hessian) instead of a vectorial 
(gradient) operator and (b) the projections onto the Bs^ balls instead of the B2 ball. Our projections are 
somewhat more expensive because of the SVD and SVR steps. However, these steps are computed in 
closed form. It is also worth mentioning that the proposed algorithm is highly parallelizable, since all 
the involved operations are performed independently for each pixel of the image. 

IV. Experimental Results 

To validate the effectiveness of our proposed Hessian Schatten norm regularization framework, we 
provide experimental results for different linear inverse imaging problems. In particular, we consider the 
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Algorithm 1 : Image reconstruction algorithm under Hessian-based ii-Sp norm regularization. 
Input: y, A, r > 0, p > 1, Of > ||A"'"A||, Pc- 
Initialization: ui = xg, ti = I, ci = ip (xq). 
while stopping criterion is not satisfied do 

s„ ^ denoise (u„ + a^^A^ (y - Au„) , ^,p,Vc)\ 

^■n+l < 2 ' 

Cn+l = <^(S„); 

if Cn+i > c„ then 

^ x„ ; 



u„+i <- X, 

else 
end 

ra <— n + 1; 
end 

return x„; 



(„-l 



Algorithm 2 : cienoise(z, r,p, Pc) - denoising algoritiim under Hessian-based £i-Sp norm regulariza- 



tion^ 

Input: z, r > 0, p > 1, Vc- 

Initialization: = e X, ti = 1. 

Output: x - optimal solution of (20). 

while stopping criterion is not satisfied do 



n+l ^ 2 



ra + 1; 
end 

return Pc (z — T'H*n„_i); 



problems of image restoration/deblurring, sparse reconstruction from random samples, image interpolation 
and image zooming. Further, we restrict our comparisons to methods that fall in the category of derivative- 
based regularization techniques satisfying the invariance properties studied in Section II. The benchmark 
for our evaluation is TV-regularization which is popular and widely used in practice for several inverse 
problems. For all the problems but image deblurring, we also provide comparisons with quadratic 
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Fig. 1. Set of standard test images. From left to right and top to bottom: Boat, Face, Fingerprint, Fluorescent Cells, Hill, 
House, Kids, Lena, Peppers and Wall. 

derivative-based regularizers. 
A. Restoration Setting 

For the image deblurring experiments, we use a set of 10 grayscale images shown in Figure 1. Six of 
them are well-known standard test images of size 512 x 512 (except the Peppers image which is of size 
256 X 256), while the rest of them were obtained from [43] and are of size 255 x 255. Since the last 
four images have intensity values in the range of [0 , 1] we normalize the intensities of all images in the 
same range. 

The performance of the methods under comparison is assessed for various blurring kernels and different 
noise levels. In particular, in our experiments we employ three point spread functions (PSFs) to produce 
blurred versions of the images. We use a Gaussian PSF of standard deviation fjfe = 4, a moving average 
(uniform) PSF, and a motion blurring kernel that was obtained from [43]. The first two PSFs have a 
support of 9 X 9 pixel while the third one has a support of 19 x 19 pixel. As an additional degradation 
factor we consider Gaussian noise of three noise levels corresponding to a blurred signal-to-noise-ratio 
(BSNR) of {15, 20, 25} dB, respectively. The BSNR is defined as BSNR = var (Ax) /al where var (Ax) 
is the variance of the blurred image and is the standard deviation of the noise. 

Regarding the restoration task, the minimization of the objective functions is performed under box 
constraints. This means that the intensities of the restored images are constrained to lie in the convex 
set C = {x G R^\xn G [0 , 1] Vn = 1, . . . , NY To accomplish this, we use the corresponding projection 
operation, Vc, which for a vector x amounts to setting the elements that are less than zero and greater to 
one to zero and one, respectively. For the constrained minimization of the objective functions that involve 
the Hessian Schatten norms we use the method described in Section III, while for the TV-based one we 
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employ the algorithm of [5]. The latter belongs to the same category of minimization algorithms as ours 
with a comparable convergence behavior. The rationale for this choice is that, the quality of the restoration 
will not depend on the choice of the minimization strategy but rather on the choice of the regularizer. In 
all cases, the stopping criterion is set to either reaching a relative normed difference of 10"^ between two 
successive estimates, or a maximum of 100 MFISTA iterations. We also use 10 inner iterations for the 
solution of the corresponding denoising problem. Moreover, instead of using the true PSF that produces 
the blurred images, we use a slightly perturbed version by adding Gaussian noise of standard deviation 
10"'^. The motivation is to test the performance of the algorithms under more realistic conditions, since, 
in practice the employed PSF normally contains some error and thus deviates from the true one. Finally, 
in all the reported experiments the quality of the reconstruction is evaluated in terms of an increase in the 
SNR (ISNR), measured in dB. The ISNR is defined as ISNR = lOlogio (MSEin/MSEout), where MSEin 
and MSEout are the mean-squared errors between the degraded and the original image, and the restored 
and the original image, respectively. 

B. Image Restoration on Standard Test Images 

In Table I we provide comparative restoration results for all the test images and all the combinations 
of degradation conditions (PSF and noise level). To distinguish between the different Hessian-based 
regularizers, we refer to them as T-LSk with k denoting the order of the Schatten norm. We report the 
results obtained by using Schatten norms of order one, two and infinity, which correspond to the well- 
known nuclear, Frobenius and spectral matrix norms, respectively. For the sake of consistency among 
comparisons, the reported results for each regularizer, including TV, are derived using the individualized 
(w.r.t. degradation conditions) regularization parameter r, that gives the best ISNR performance. 

For all tested images and degradation conditions the Hessian-based regularization framework leads 
to improved quantitative results compared to those of TV. The best SNR improvement, on average, is 
achieved for the T-LSi regularizer, while comparable results are also obtained for the 'HS2 regularizer. 
While the T-LSoo regularizer outperforms TV most of the time, as well, the improvement is less pronounced 
than that of the other two regularizers. We can thus conclude, that as the order of the Schatten norm moves 
closer to 1, the reconstruction results improve. This can be attributed to the fact that, in the extreme case 
of order infinity, the corresponding regularizer takes into account only the maximum absolute eigenvalue 
and thus fails to include additional information possibly provided by the second eigenvalue. Overall, 
the improvement in performance over TV can be quite substantial (more than 0.5 dB), which justifies 
Hessian-based regularization as a viable alternative. 
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TABLE I 

ISNR COMPARISONS ON IMAGE RESTORATION FOR THREE BLURRING KERNELS AND THREE NOISE LEVELS 



Image / BSNR 


Gaussian blur; 9x9 


UnifoiTn blur: 9 x 9 


Motion blur: 19 x 19 


TV 




HSi 


■HSi 


TV 






HSi 


TV 




HS2 


HSi 




15 dB 


2.59 


2.63 


2.71 


2.75 


3.20 


3.22 


3.32 


3.36 


3.88 


3.80 


3.94 


4.03 


o 
CQ 


20 dB 


2.95 


3.03 


3.12 


3.18 


3.80 


3.84 


3.95 


4.02 


5.02 


5.09 


5.26 


5.35 




25 dB 


3.81 


3.88 


4.00 


4.07 


4.77 


4.81 


4.96 


5.05 


6.78 


6.94 


7.12 


7.19 




15 dB 


3.60 


4.32 


4.43 


4.45 


3.71 


4.38 


4.48 


4.51 


4.61 


5.80 


5.92 


5.96 




20 dB 


3.96 


4.71 


4.84 


4.88 


4.30 


4.95 


5.08 


5.13 


5.99 


7.32 


7.48 


7.53 




25 dB 


4.78 


5.62 


5.80 


5.84 


5.41 


6.16 


6.35 


6.41 


7.93 


9.26 


9.44 


9.49 


_'c 


15 dB 


3.91 


5.05 


5.15 


5.13 


4.15 


5.27 


5.37 


5.37 


4.28 


6.05 


6.10 


6.08 


a. 

CJj 


20 dB 


4.68 


5.85 


5.95 


5.92 


5.16 


6.39 


6.49 


6.46 


6.39 


8.13 


8.20 


8.16 


e 


25 dB 


5.80 


6.84 


6.92 


6.86 


6.46 


7.60 


7.70 


7.64 


8.65 


10.10 


10.18 


10.14 


6 


15 dB 


2.84 


3.61 


3.66 


3.65 


3.26 


3.92 


3.98 


3.99 


3.86 


4.52 


4.62 


4.64 




20 dB 


2.84 


3.62 


3.69 


3.70 


3.50 


4.20 


4.27 


4.28 


4.68 


5.50 


5.63 


5.66 


1 


25 dB 


3.48 


4.32 


4.40 


4.41 


4.30 


5.04 


5.12 


5.13 


6.32 


7.23 


7.36 


7.37 




15 dB 


2.85 


2.95 


2.99 


3.00 


3.27 


3.32 


3.38 


3.40 


3.56 


3.69 


3.79 


3.84 


s 


20 dB 


2.65 


2.74 


2.80 


2.82 


3.30 


3.39 


3.46 


3.50 


4.05 


4.22 


4.34 


4.39 




25 dB 


3.07 


3.22 


3.29 


3.32 


3.93 


4.06 


4.15 


4.18 


5.39 


5.51 


5.63 


5.64 




15 dB 


2.91 


3.28 


3.33 


3.33 


3.57 


3.87 


3.94 


3.95 


4.20 


4.75 


4.86 


4.88 


s 

o 

s 


20 dB 


3.49 


3.92 


3.99 


3.97 


4.35 


4.78 


4.87 


4.88 


5.76 


6.50 


6.64 


6.65 




25 dB 


4.45 


4.92 


5.00 


4.99 


5.50 


6.00 


6.10 


6.09 


7.86 


8.65 


8.80 


8.78 




15 dB 


4.15 


4.33 


4.41 


4.42 


4.48 


4.62 


4.74 


4.77 


5.10 


5.35 


5.47 


5.50 


S 


20 dB 


4.25 


4.61 


4.70 


4.70 


4.95 


5.13 


5.25 


5.29 


6.07 


6.57 


6.72 


6.75 




25 dB 


4.93 


5.42 


5.51 


5.53 


5.87 


6.15 


6.27 


6.30 


7.77 


8.52 


8.69 


8.67 




15 dB 


3.52 


3.60 


3.67 


3.71 


3.79 


3.85 


3.95 


4.00 


4.22 


4.34 


4.48 


4.56 


Len; 


20 dB 


3.25 


3.42 


3.51 


3.55 


3.78 


3.89 


4.00 


4.04 


4.58 


4.88 


5.03 


5.11 




25 dB 


3.64 


3.91 


4.01 


4.06 


4.36 


4.52 


4.62 


4.68 


5.92 


6.29 


6.45 


6.51 




15 dB 


3.24 


3.52 


3.63 


3.71 


4.32 


4.39 


4.55 


4.66 


5.28 


5.19 


5.38 


5.52 




20 dB 


4.31 


4.35 


4.52 


4.67 


5.84 


5.75 


5.91 


6.03 


7.39 


7.09 


7.31 


7.48 


Oh 


25 dB 


5.69 


5.59 


5.78 


5.93 


7.26 


7.03 


7.20 


7.32 


9.01 


8.68 


8.92 


9.09 




15 dB 


4.01 


4.20 


4.30 


4.31 


4.15 


4.45 


4.57 


4.60 


4.77 


5.13 


5.25 


5.29 


Wal 


20 dB 


4.63 


4.88 


5.02 


5.06 


5.17 


5.48 


5.60 


5.66 


6.34 


6.65 


6.84 


6.90 




25 dB 


5.79 


6.01 


6.21 


6.28 


6.71 


6.88 


7.07 


7.14 


8.41 


8.78 


8.99 


9.08 




Fig. 2. Restoration of the Face image degraded by Gaussian blurring and Gaussian noise corresponding to a BSNR level of 
15 dB. (a) Degraded image (PSNR = 21.76 dB), (b) TV solution (PSNR = 25.36 dB), (c) HS^ solution (PSNR = 26.08 dB), 
and (d) HSi solution (PSNR = 26.21 dB) 
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(a) (b) (c) (d) 



Fig. 3. Restoration of the Kids image degraded by motion blurring and Gaussian noise corresponding to a BSNR level of 20 
dB. (a) Degraded image (PSNR = 21.84 dB), (b) TV solution (PSNR = 27.91 dB), (c) "H^g solution (PSNR = 28.56 dB), and 
(d) nSi solution (PSNR = 28.59 dB) 

Beyond the ISNR comparisons, the effectiveness of the proposed method can also be visually 
appreciated by inspecting the representative Face, Kids and House deblurring examples of Figures 2 
- 4, where we provide the results obtained using TV and two of our Hessian-based regularizers. From 
these examples we can verify our initial claims, that TV regularization leads to image reconstructions 
that suffer from the presence of heavy block artifacts. These artifacts become more evident in regions 
where the image is characterized by smooth intensity transitions, and they are responsible for shuffling 
details of the image and broadening its fine structures. These effects can give us an unnatural impression 
about the image. For example, the TV solution in Fig. 2 is closer to a cartoon-like version of the original 
image. On the other hand, even in cases where the presence of the noise is significant, the Hessian-based 
regularizers manage to avoid introducing pronounced artifacts and thus, they lead to reconstructions that 
are more faithful representations of the original content of the image. Distinguishing between the different 
Hessian Schatten norms is more subtle and requires a closer visual inspection. However, differences do 
exist and, for instance, we can observe that the TiSoo result in Fig. 2 seems more noisy and less regular 
than its HSi counterpart. 

C. Deblurring of Biomedical Images 

Our interest in image restoration is mostly motivated by the problem of microscopy image deblurring. In 
widefield microscopy, the acquired images are degraded by out-of-focus blur due to the poor localization 
of the microscope's PSF This severely reduces our ability to clearly distinguish fine specimen structures. 
Since a widefield microscope can be modeled in intensity as a linear space-invariant system [44], our 
assumption about the forward model in (7) can be considered valid and we can, thus, employ the proposed 
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(a) 




(c) 


(d) 



Fig. 4. Restoration of the House image degraded by uniform blurring and Gaussian noise corresponding to a BSNR level of 
25 dB. (a) Degraded image (PSNR = 20.43 dB), (b) TV solution (PSNR = 25.93 dB), (c) HS^ solution (PSNR = 26.43 dB), 
and (d) ■HS2 solution (PSNR = 26.53 dB) 



framework for the restoration of the underlying biomedical images. 

To evaluate the practical relevance of our approach, we provide deblurring experiments on two real 
images of fluorescence specimens. For each sample we acquired two image-stacks using a confocal 
microscope. This type of microscope can reject out-of-focus light using a small aperture in front of the 
detector and can thus avoid the blurring effect, but at the expense of more measurement noise. When 
the aperture is opened, the intensity of the incoming light is increased and the SNR is improved, but 
this time the measurements include interference from adjacent out-of-focus objects. In this case the final 
result is blurred and it is equivalent to an image acquired by a "cheaper" widefield microscope. For more 
details on the image acquisition we refer to [45]. 

The reported results refer to the restoration of the second type of image-stacks, with the first ones 
serving as visual references to evaluate the quality of the resulting estimates. The size of the image-stacks 
for the first specimen shown in Figs. 5(a) and 5(b) are 352 x 512 x 96 while the size of the image stacks 
for the second sample shown in Figs. 6(a) and 6(b) are 512 x 512 x 16. For the first specimen we have 
available two-channels, blue and green, but we only use the green one. From each of the image-stacks we 
obtained a single image to work with, by computing the average intensity with respect to the z-axis. We 
did the same to obtain a 2D PSF out of a standard diffraction-limited 3D PSF model using the nominal 
optical parameters of the microscope (numerical aperture, wavelength, optical zoom) [44]. 

In Figs. 5(c) and 5(d) we present the restored images using TV and T-LSi regularization, while in 
Figs. 6(c) and 6(d) we provide the restored images using TV and 7^52 regularization. From these two 
examples, if we compare the obtained results with the confocal acquisitions, we can verify that the 
Hessian-based solutions are quite successful in reveaUng the primary features of the specimens without 
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(c) (d) 
Fig. 5. Restoration results on a real fluorescent-cell image of size 352 x 512. Close-up of (a) widefield image, (b) reference 
confocal image, (c) TV reconstruction, (d) HSi reconstruction. The details of this figure are better seen in the electronic version 
of this paper by zooming on the screen. 



introducing severe artifacts, as opposed to TV which oversmooths certain features and wipes out important 
details of the image structure. Therefore, we conclude that our regularizers can do a better job, especially 
when one has to deal with images that consist mostly of ridges and filament-like structures, as is often 
the case in biomedical imaging. 

D. Sparse Image Reconstruction 

In sparse image reconstruction the observed image y is degraded by a masking operator which randomly 
sets pixel values to zero. This operator corresponds to a diagonal matrix A whose diagonal entries are 
randomly set to zero or one. We refer to this problem as sparse because in our experiments we consider 
masking operators that retain only 2%, 5%, 8% and 10% of the initial pixel values. Note that this problem 
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(c) (d) 
Fig. 6. Restoration results on a real fluorescent-cell image of size 512 x 512. Close-up of (a) widefield image, (b) reference 
confocal image, (c) TV reconstruction, (d) 'HS2 reconstruction. The details of this figure are better seen in the electronic version 
of this paper by zooming on the screen. 



can be considered as compressive sensing if we assume that the image is sparse in the spatial domain. 

The reported experiments are conducted on the gray-level images: Boat, Hill, Lena and Peppers. The 
masked images are then reconstructed using the same regularizers as before plus two quadratic regularizers 
based on the gradient and the Laplacian operators, respectively. In this setting we do not consider any 
presence of noise and thus, for all the regularizers under comparison, we use the same regularization 
parameter r = 10~^. The value of r is chosen to be small to ensure that the results will be consistent, in 
the sense that the reconstruction methods will not alter the unmasked pixel values. However, due to the 
small value of the regularization parameter, we have observed that the convergence of the minimization 
task for all the regularizers can be slow and thus more than 100 iterations are required. To cope with 
this problem, for the non-quadratic regularizers, we apply a simple continuation scheme that significantly 
speeds up the convergence of the minimization algorithm: We start with a large value for r and then we 
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TABLE 11 

PSNR COMPARISONS ON SPARSE IMAGE RECONSTRUCTION FROM RANDOM SAMPLES FOR 4 RATIOS OF OBSERVED PIXELS 



Observed 
pixels % 


Grad. 


Lap. 


TV 


ns 


'HS2 


HSi 




2% 


21.29 


20.83 


18.52 


21.54 


21.56 


21.55 




5% 


22.96 


22.87 


21.22 


23.27 


23.31 


23.33 


CB 


8% 


23.92 


24.03 


22.43 


24.27 


24.33 


24.37 




10% 


24.51 


24.67 


23.00 


24.92 


25.00 


25.06 




2% 


22.53 


22.39 


19.68 


22.97 


23.00 


23.01 




5% 


24.27 


24.33 


22.13 


24.70 


24.76 


24.77 


X 


8% 


25.34 


25.49 


23.69 


25.75 


25.79 


25.80 




10% 


25.90 


26.07 


24.52 


26.29 


26.34 


26.35 




2% 


22.33 


22.79 


18.89 


23.23 


23.34 


23.38 


G 


5% 


24.60 


25.42 


22.42 


25.56 


25.70 


25.82 


^ 


8% 


25.98 


26.96 


24.37 


26.96 


27.11 


27.24 




10% 


26.63 


27.64 


25.28 


27.62 


27.77 


27.90 




2% 


18.58 


18.50 


15.68 


19.20 


19.27 


19.32 


u 

u 

a 


5% 


20.73 


20.93 


18.31 


21.53 


21.64 


21.71 


a 


8% 


21.71 


22.21 


19.97 


22.49 


22.60 


22.66 




10% 


22.41 


22.90 


20.92 


23.18 


23.32 


23.39 



gradually decrease it to reach the chosen value. We observe experimentally that following this strategy and 
using 200 MFISTA iterations (we still solve the corresponding denoising problems using 10 iterations) 
we can solve the problem to high accuracy. Regarding the two quadratic regularizers, we minimize their 
objective functions using the conjugate gradient method [46] with a maximum of 2000 iterations. 

In Table II we report the reconstruction results we obtained for all the employed regularization 
techniques. The quality of the reconstructions is measured in terms of PSNR. As we can observe from 
this table, TV does not perform well in this problem and its reconstructions fall far behind, including 
the two quadratic regularization techniques. On the other hand, our Hessian Schatten norm regularizers 
behave much better and in all cases they lead to estimates that outperform the other methods. As in the 
image restoration case, the T-LSi regularizer leads to the best reconstructions while the 1-132 regularizer 
follows rather closely. In Fig. 7 we present a representative example of the reconstruction of the Peppers 
image from 2% observed pixels using Laplacian-based quadratic regularization, TV and TiSi. From this 
example it is clear that TV cannot produce an acceptable result but instead leads to a piecewise constant 
solution that does not reveal any features of the image. On the other hand, both the quadratic and the 
proposed regularizer provide meaningful reconstructions with the latter achieving a better performance. 
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(a) (b) (c) (d) 

Fig. 7. Sparse reconstruction of the Peppers image from 2% observed pixels, (a) Masked image, (b) Laplacian-based quadratic 
solution (PSNR = 18.50 dB), (c) TV solution (PSNR = 15.68 dB), and (d) nSi solution (PSNR = 19.32 dB) 




(a) (b) (C) (d) (e) 

Fig. 8. Image interpolation. Close-up of (a) High-resolution image, (b) low-resolution image, (c) Laplacian-based quadratic 
solution (PSNR=27.74 dB), (d) TV solution (PSNR=23.95 dB), and (e) USi solution (PSNR=27.92 dB) 




Fig. 9. Image zooming. Close-up of (a) High-resolution image, (b) low-resolution image, (c) gradient-based quadratic solution 
(PSNR = 25.95 dB), (d) TV solution (PSNR = 26.00 dB), and (e) H52 solution (PSNR = 26.08 dB) 

E. Image Interpolation and Image Zooming 

Image interpolation and image zooming fall into the same class of linear inverse problems. As in the 
sparse image reconstruction case, a degradation factor of the underlying image is a masking operator that 
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TABLE III 

PSNR COMPARISONS ON IMAGE INTERPOLATION AND IMAGE ZOOMING FOR A 4x DOWNS AMPLING FACTOR 





Images 


Grad. 


Lap. 


TV 






nSi 


e 
_o 


Boat 


24.03 


24.19 


21.90 


24.43 


24.45 


24.45 


lolat 


Hill 


25.48 


25.62 


23.31 


25.83 


25.84 


25.83 


Interp 


Lena 


26.57 


27.74 


23.95 


27.79 


27.89 


27.92 


Peppers 


21.95 


22.13 


19.48 


22.55 


22.63 


22.68 




Boat 


25.95 


25.89 


26.00 


26.01 


26.08 


26.12 


UIUI 


Hill 


27.32 


27.27 


27.22 


27.31 


27.34 


27.35 


Zoo 


Lena 


29.25 


29.26 


29.50 


29.40 


29.54 


29.61 




Peppers 


24.10 


24.13 


24.32 


24.69 


24.76 


24.77 



zeros out some of the image pixel values. However, in these two cases the masking operator corresponds 
to subsampling and is highly structured, as opposed to the random masking operator that we used above. 
The difference between the two considered forward models is that image interpolation involves only the 
subsampling operator and therefore results in observed images that suffer from aliasing, while image 
zooming involves additionally an antialiasing operator that is applied to the underlying image before the 
subsampling takes place. In the last case, the matrix A can be expressed as A = SF where F corresponds 
to the filtering operation and S to the subsampling of the resulting image. Once more, we do not consider 
any presence of noise and we thus use the same regularization parameter and minimization strategy as 
above. The experiments we present are conducted on the same four images as in Section IV-D, using 
the same regularizers for a downsampling factor of 4. Finally, regarding the antialiasing filter we use a 
Gaussian kernel of support 9x9 and standard deviation at = 1-4. 

In Table III we report the obtained results and we evaluate the quality of the estimates in terms of PSNR. 
Regarding the interpolation problem we observe that TV, similarly to the sparse image reconstruction 
case, does not perform well and produces the worst scores. However, its performance gets significantly 
better in the image zooming case where an antialising filtering is applied. This is an indication that 
TV cannot perform at a satisfactory level when the operator acting on the image does not involve a 
mixing effect. On the other hand, the performance of the proposed regularizers is more robust to the 
nature of the operator degrading the image, and they lead to the best reconstructions. To have a visual 
performance assessment, we present in Figs. 8 and 9 interpolation and zooming results on the Lena and 
Boat image, respectively. These results confirm our previous conclusions about the performance of TV 
and demonstrate the superiority of the Hessian-based regularizers over the other regularizers. 
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V. Conclusion 

In this paper we introduced a new family of convex non-quadratic invariant regularizers that can 
potentially lead to improved results in inverse imaging problems. These regularizers incorporate second- 
order information of the image and depend on the Schatten norms of the Hessian. We further designed 
an efficient projected gradient algorithm for minimizing the corresponding objective functions, under 
additional convex constraints on the solution. The computational complexity of the proposed algorithm is 
somewhat higher than that of TV, due to the use of a matrix instead of a vectorial operator, but it is highly 
parallelizable. We also presented a new result that relates vector projections onto iq norm balls and matrix 
projections onto Schatten norm balls. This connection enabled us to design an efficient matrix-projection 
method, which is a fundamental ingredient of our optimization algorithm. 

The performance and practical relevance of the proposed regularization scheme was assessed for several 
linear inverse imaging problems, through comparisons on simulated and real experiments with derivative- 
based quadratic and the TV regularizer. The results we obtained are promising and competitive both in 
terms of SNR improvement and visual quality. 

Appendix A 

A. Proof of Theorem 1 

By taking the domain Q to be a disk, the rotation invariance of TZ (/) implies that 

7^(/(R,•)) =7^(/) (38) 

where is a rotation matrix. In particular, (38) must hold for all functions, including those of the form: 
/ (r) = ari + /3r2, with r G and a,/3 G M. Their gradient is constant and equal to V/ (r) = [^]. 
Now, using / as defined above, we write the l.h.s of (38) as 

7^(/(R,•))= / <I>(V{/(R,-)}(r))dr 
Jn 

^ (R^V/ (Rer)) dr 

= / <I>(Rn^])dr 
Jn 

= (^Va2 + /32 [sin(^ + ^0 sin(^ + ^' + f)]'^)dr, (39) 

where 9' = sgn (a) arccos (^--j^=^. Setting = ^ — 9' in (39) and combining the result with Prop- 
erty (38), we immediately get that 

$(x) = $(|x|),VxGM2. (40) 
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The scaling invariance of IZ can be restated as 

7^,(/(a•)) = a^7^(/) 

/ <I>(V{/(a-)}(r))dr = a'^ / $(V/(r))dr (41) 

for some a > 0, and an exponent fi £ R. This property must hold for the functions of the form: 
/ (r) = ari, with r G and a G M. The magnitude of their gradient is constant and equal to 
|V/ (r)| = |a|. Now, using / as defined above and the result of (40), we write (41) as 

= / ^>(a|a|)dr-a^ [ «>(|a|)dr. (42) 

Therefore, we directly have that 



^{a\a\) = a''^{\a\),yaeR, (43) 

with u = fi + 2. Now, we define the function 

^•0(0) = (44) 

which is homogeneous of degree 0. This implies that $0(0) = c, where c is an arbitrary constant. 
Therefore, the potential functions $ that satisfy (43), are necessarily of the form: ^ (•) = c\-\'^. 
The inverse statement of the theorem can be verified by substitution, using the property 

V{/(a-)}(r) = aV{/}(ar), V/. (45) 



B. Proof of Theorem 2 

By taking the domain J] to be a disk, the rotation invariance of TZ (/), as defined in (38), must hold 
for all functions, including those of the form: / (r) = ^rf + + 7rir2, with r G and a, /3, 7 G M. 
Their Hessian is constant and equal to Hf (r) = [ " ^ ] - Now, using / as defined above, we write the 
l.h.s of (38) as 

7^(/(Re•))= / ^CH{/(Rr)}(r))dr 

= [ $(Rj?^/(R,r)Re)dr 
Jn 

= [ «l>(Rj[;^]Re)dr. (46) 
Jn 

According to the spectral decomposition theorem, A = [ ° ^ ] , being a symmetric matrix has an eigenvalue 
decomposition. This implies that there exists a rotation 6' such that R^ ARg- = A, where A is a diagonal 
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matrix consisting of the eigenvalues of A. These are denoted as Afc, where k = 1,2. Based on this 
observation and combining it with Property (38), we immediately get that 

$(A) = $(Ai,A2),VA G m2x2^ (47) 

which implies that $ should be a function of the Hessian eigenvalues. 
The scaling invariance of TZ can be restated as 

7^,(/(a•)) = a^7^(/) 

/ c^(?^{/(a-)}(r))dr = a^ / $(?^/(r))dr (48) 
Jn/a Jn 

for some a > 0, and an exponent /i G M. This property must hold for the functions of the form: 
/ (r) = |rf + |r|, with r G and a, /3 G M. Their Hessian is constant and equal to (r) = [ o /?] ■ 
Now, using / as defined above and the result of (47), we write (48) as 



[ $ (a^a, 0^/3) dr = / <l>{a,/3)dr. (49) 
Jn/a Jn 

Therefore, we have that 

$ (a^a, 0^/3) = a^+2^> (q, /3) , Vq, /3 G M . (50) 

Now, we define the function 

$o(x) = ^,VxGM2, (51) 
l|x||p 

where p > 1 and v = is homogeneous of degree and thus $o (x) = (x/ ||x|M . Therefore, 



the potential functions $ that satisfy (50), are necessarily of the form: <I> (x) = $0 (^x/||x||pj ||x||p. 
Finally, since x represents the vector of the eigenvalues of the Hessian, its ip norm corresponds to the 
Sp norm of the Hessian itself. 

The inverse statement of the theorem can be verified by substitution, using the property 

n{f{a-)}{r) = a^n{f}{ar),yf. (52) 
Appendix B 

A. Adjoint of the Disrcete Hessian Operator 

To find the adjoint of the discrete Hessian operator, we exploit the relation of the inner products of 
the spaces and X in (13). Using (11), we can equivalently write (13) as 

N N 

J]tr ([^x]^ Y„) =Y.Xn [n*Y]^ . (53) 

n=l n=l 
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We then expand the l.h.s of (53), to obtain 

N N 



n=l 



n=l 

N 



[A,,,,x]„y(1'1) + [A,,,,x]„ y(1'2) + Yi^.i) + [A,,,,x]„ Yi^ 



(2,2) 



n=l 
AT 



n=l 



J^xJ A;,Y(1'^) + a;, f Y^2) + Y(2-i) 



+ 



A* y(2.2) 



(54) 



Note that Y^*'-^^ corresponds to the vector that is composed of the (z, j) entries of all Y„ G M^^^ matrices. 
Now, by comparing the rh.s of (53) to the r.h.s expansion of (54), it is straightforward to verify that the 
adjoint of the discrete Hessian operator is indeed computed according to (14). 



B. Proof of Proposition 1 

By definition, the orthogonal projection of a matrix Y onto the set Bs^ is given by 

'Pbs, (Y) = argmin ||X - Y||^ . 

I|X|U,<P 

Since all Schatten norms are unitarily invariant, we equivalently have 



ars min 1 1 U^X V - YV 1 1 J, 



(55) 



(56) 



||U«XV|j5^<p 

Let us now consider the matrix Z = U^XV that is associated with the solution of (55). If we substitute 
Z in (56), then we end up with the following constrained minimization problem 

|2 



^Bs„ (^) = argmin ||Z - S||p , 

iizilc <P 



(57) 



which corresponds to the projection of the diagonal matrix XI onto the set Bs^- Now, if Vbs^ (^) = 
we have 



> 



+ WE\\p - 2Re (tr ( Z^S 



F 

s - s 



+ \m 



2tr 



(58) 



where the inequality stems from Von-Neumann's trace theorem [47], and S is a diagonal matrix with 
the singular values of Z. In addition, according to the definition of the Schatten norms, it holds that 



< 



(59) 
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Equations (58) and (59) immediately imply that the projection of Xl equals to Z = S, i.e., a positive 
semidefinite diagonal matrix. We can then perform this operation by projecting the vector, formed by 
the main diagonal of S, onto the convex set Bq, and then by transforming the projected vector back to 
a diagonal matrix. Using this fact and the relations between the optimal solution of (55) and (57), we 
finally express the projection of the matrix Y onto Bs^ as 

= Udiag(pB,(o-(Y))) V^. (60) 

C. Proof of Lemma 1 

First, we present a matrix inequality that involves the Schatten norms and it will be subsequently used 
for the proof of the lemma. Let X, Y G £n,ixn2_ xhen, the inner product of these two matrices satisfies 
the following inequality 

(X , Y)c„,><,.. = Re (tr (Y^X)) < {a (X) , a (Y)), 

<\\a{X)\\Ja{Y)\\^ = \\X\\sJ\Y\\s^. (61) 

The first inequality is due to the Von-Neumann's trace theorem [47], while the second one due to Holder's 
inequality. The last equality holds true from the definition of Schatten norms. 
By definition, the dual norm of (21) is given by [32]: 

lirilln = max ($1, , (62) 

where A!, instead of ]R^x2x2 jj^^j used throughout the paper, here is assumed to be the more general 
linear space X = ([;;^x"iX"2 consider the inequality 

N N 
{n,^)^ = ^Re{tr{^i!nn)) <^\\^n\\sj\'^n\\s, , (63) 

n=l n=l 

which immediatelly follows from inequality (61). Now, by introducing the vectors u = 
[W^iWs, ' ll^sll^, , • • • , II^^Jvll^,) and = (H^iH^^ , ||*2|U^ , • • • , H^ivll^J, and applying once again 
Holder's inequality, we get 

N 

W^nWs^ W^Js, = , < ||a;||^ IIV'lli = II^IL,, \mi,p ■ (64) 

n.=l 

From the definition of the dual norm (62) and the inequalities (63) and (64) we conclude that 
II^^IId — ll^lloof/- prove that ||ri||^ = we next show that for each we can find a 'I' 

satisfying H^H^p = 1, and for which {Ct , ^')^ = To that end, let k be any index in the set 

September 18, 2012 DRAFT 



31 



|argmax^<„<;^ ll^nll^ | and flk = U^XlfeV^ be the singular value decomposition of ftk- Then, we 
set = O for all 7i except for n = A: for which we have 



*fc = UfcDVf , (65) 



where 



and S^!'-'^ corresponds to the (i,j)-th entry of the matrix I]^ G j^mm(ni,n2) have that 

N 



{n , = J^Re (tr = Re (tr {^^n,}) 

i(ni,ri2) . , 



n=l 

minf'r).! rj^^l 



= W^uWs, = m^^^ . (67) 
Furhermore, for the mixed norm ||*||i ,„ = 11 '^'i lie it holds that 



1/p 



'min(ni,n2) \ S ( ^ 



t=l I II A. 115 



. i=l ^ W 



lO HO 



T = 1 > (68) 



which completes the proof of the lemma. 



D. Proof of Proposition 2 

For any pair of variables , ^ G Af we have 

||V. (fi) - Vs m\x = Wr-H [Vc (z - rWil) - Vc {z - t?^**)]||;, 
< r ll^ll \\Vc (z - T-H*n) -Vc{z- tH*^ 



2 



<t\\'H\\ Wt-H* (fi- *) 



12 



< ll^ll 11^*11 \\ft - 

= Wnf \\n - ^W;^ . (69) 
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Note that, the first and the third inequalities follow from the relation between the norms, defined in the 
spaces X and M^, and the induced operator norm, i.e., ||'Kx||;(^ < ||^|| ||x||2, while the second one holds 
because the projection operator Vc onto the convex set C C M^, is firmly nonexpansive [48, Proposition 
4.8]. This means that 

||7'c(x)-Pc(y)|l2< l|x-y||2 Vx,yGM^. (70) 

To compute an upper bound of we exploit that H'KH^ = [49] (a general property of 

bounded linear operators), and we get 

— II ^ ^ ||^rir2^''i''2 II + 1 1 ^r2r2 ^''2r2 1 1 ) l|x||2 

= (||A,,,J|2 + 2||A,,,,f + ||A,,,,f) ||x||2 . (71) 

Now, using the definitions of the second-order differential operators in (10), it is easy to show that each 
of ||A,.j,.J|, ||A,.j,.J| and ||A,.2,.J| is smaller than or equal to 4. This immediately implies that \\'H\\ < 8 
and, hence, an upper bound of the Lipschitz constant of Vs (fl) will be L (s) < \\'H\\'^ < 64r^. 
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